Leonard - EAKOS 2009

VAST 2009 Challenge
Challenge 1: -  Badge and Network Traffic

Authors and Affiliations:

Lorne Leonard, The Pennsylvania State University - Research Computing & Cyberinfrastructure, lorne_leonard@hotmail.com [PRIMARY contact]

Tool(s):

EAKOS is a collection of tools to demonstrate how one can interface with web based visualization and GIS services. The toolset is an early prototype developed by Lorne Leonard during his spare time at the 2008 Christmas break and weekends leading up to the competition deadline.  Lorne works with researchers and faculty at The Pennsylvania State University and he uses the toolset to demonstrate potential visualization and analytical solutions to enhance their research goals. 

 

Video:

 

Leonard_Vast2009_Challenge1.mov

 

 

ANSWERS:


MC1.1: Identify which computer(s) the employee most likely used to send information to his contact in a tab-delimited table which contains for each computer identified: when the information was sent, how much information was sent and where that information was sent.  Please name the file Traffic.txt and place it in the same directory as your index.htm file.  Please see the format required in the Task Descriptions.

Traffic.txt


MC1.2:  Characterize the patterns of behavior of suspicious computer use.

I approached this challenge by identifying who went against policy and entered the building, and more importantly the classified room, without badging in. Using the "Prox Card" dataset, I plotted the sequence of daily activities per employee ID based on the Prox card readers as demonstrated in Figure 1. To code this tool took approximately three days. Highlighted in orange are the IDs that "piggybacked" in/out of the classified room (for the entire day) and the event is linked with a yellow line as shown in Figure 1.

 

Mini1.jpg

Figure 1: Prox Card data for ID 30 for the entire month duration.

 

Automatically highlighting the days when IDs broke protocol made it very easy and fast (about 5 minutes) to scroll through the 60 IDs and identify that IDs 30, 38 and 49 broke the protocol by piggybacking in/out of the classified room (Table 1). ID 30 was the worst offender with three piggyback events into the classified room. What was this person doing during the period of entering/leaving the classified room to the next event?

 

ID

Day

Event

Event Start Time

Next Event

30

10

Did not prox in-classified

10:33 AM

5:05 PM

30

17

Did not prox in-classified

11:31 AM

2:03 PM

30

24

Did not prox in-classified

9:00 AM

10:52 AM

38

4

Did not prox out-classified

1:12 PM

2:15 PM

49

8

Did not prox out-classified

12:56 PM

2:06 PM

Table 1: IDs who piggybacked within the classified room.

 

Each of the three events for ID 30 occurred on a Thursday. Is this an arranged agreement with his handlers?

 

MIni1Figure2.jpg

Figure 2: Prox Card data for IDs 38 and 49 for the entire month duration.

 

To investigate who entered the building only in the morning by piggybacking, I manually inspected which days lacked a grey marker at the start of the sequence. These results, shown in Table 2, took approximately 10 minutes to generate.

 

ID

Day(s)

0

17

7

2

13

8,23

27

24

37

24

38

3

39

24

48

23

49

8,22,31

50

30

51

2

54

16

55

16

58

31

59

31

Table 2: IDs who piggybacked in the morning.

 

Now that I have identified the IDs who are piggybacking into the classified room, I developed another tool to help identify ID connections, traffic amounts and event times.  This took about four days to code, load the data and visualize. I visualized the "IP traffic" dataset in two ways. The first, (Figure 3) by plotting source IPs against destination IP responses and request sizes as a Treemap. This tool also includes the embassy plan showing assigned rooms per ID and a sequence plot against the embassy plan of entering the building and in/out of the classified room.  By referring to this sequence diagram, and the results from the first tool above, I can identify when the piggybacking occurred. Furthermore, by using the mouse, I can scroll over the larger Treemap cells to identify when the IP traffic happened and manually determine if this event is around my time of interest. The second visualization method (Figure 4) plots response versus request payloads per ID within a specified date range. The purpose of this tool is to help identify if the traffic event identified in the first visualization method is an outlier or not. By manually doing mouse-over's with the outliers I can identify if these events happened near the piggyback event or not. This process took approximately 30 minutes to mouse over and collect the data, and the results are shown in Table 3.

 

Mini1D.jpg

Figure 3: Traffic sizes using a Treemap and ID sequence within embassy.

 

Mini1F.jpg

Figure 4: Plotting response versus request payloads.

 

Source IP

AccessTime

DestIP

Socket

ReqSize

RespSize

Count

37.170.100.30

2008-01-10T10:35:10.367

10.30.138.140

80

3817

569386

29

37.170.100.30

2008-01-10T14:29:11.316

101.160.27.28

80

4202

335984

12

37.170.100.30

2008-01-10T15:00:15.208

100.104.83.89

80

44979

1751371

2

37.170.100.30

2008-01-10T15:39:57.166

105.77.95.226

80

56278

606589

3

37.170.100.30

2008-01-10T16:09:56.584

105.211.108.147

80

4953

1597330

10

37.170.100.30

2008-01-10T16:12:03.871

103.143.114.91

80

53081

599157

4

37.170.100.30

2008-01-10T16:49:38.494

10.124.235.51

80

44364

893780

4

37.170.100.30

2008-01-10T16:56:40.458

103.120.93.59

80

16568

264422

4

37.170.100.30

2008-01-17T12:38:41.768

104.73.180.170

80

6000

404896

21

37.170.100.30

2008-01-17T12:38:51.905

37.105.202.184

80

65533

27996

5

37.170.100.30

2008-01-17T13:36:47.489

103.76.60.0

80

5076

10384

2

37.170.100.30

2008-01-17T13:36:53.933

10.30.138.140

80

5627

371604

29

37.170.100.30

2008-01-24T14:46:16.832

100.226.208.157

80

4879

156342

13

37.170.100.30

2008-01-24T14:46:25.842

10.228.35.56

80

55712

55412

1

37.170.100.30

2008-01-24T17:18:16.094

 

10.30.138.140

 

80

6789

 

5104487

 

29

37.170.100.38

2008-01-04T17:28:43.475

37.109.133.151

80

63665

1880595

2

 

 

 

 

 

 

 

37.170.100.49

2008-01-08T16:21:30.114

106.192.237.252

80

5502

1356516

3

 

 

 

 

 

 

 

Table 3: IDs who piggybacked and possible payload outliers

 

As previously mentioned, ID 30 is the worst offender with piggyback events occurring on days 10, 17 and 24. The count column in Table 3 indicates the number of times the source IP contacted the destination IP for one month. The most suspicious destination IP is 10.30.138.140 where ID 30 responded with large amounts of data on days 10, 17 and 24. Additionally, ID 30 contacts this person on a regular basis, with 29 times for the month and he/she appears to be masking the activity by visiting another destination IP at nearly the same time (highlighted in yellow).

 

There is no activity near the piggyback event for IDs 38 and 49. Perhaps they are leaving the building with the data.  The majority of larger response payloads happen either before the piggyback event or sometime after another visit to the classified room. A possible scenario is that either ID left the building to meet his/her contacts to discuss the collected data and confirm the data meets his/her needs before transmitting the information later that same day. If true, the event may have been at 1/4/2008 5:28 PM to 37.109.133.151 for ID 38 and 2008-01-08T16:21:30.114 to 106.192.237.252 for ID 49. However, if you compare the payload events for the entire month for both IDs, the amount does not appear significant (Figure 5).

 

image017.jpg

image019.jpg

Figure 5A: Possible event for ID 38 compared with entire month of events. Blue circle denotes request size at 1/4/2008 5:28 PM.

Figure 5B: Possible event for ID 49 compared with entire month of events. Blue circle denotes request size at 1/8/2008 4:21 PM.